{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "46d6c164544f"
      },
      "outputs": [],
      "source": [
        "# Copyright 2020 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2457083dc86a"
      },
      "source": [
        "# Deploying an auto-scaling model with AI Platform Prediction "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "44dd43009ccf"
      },
      "source": [
        "This notebook demonstrates how to deploy a pre-trained model to the AI Platform Prediction service. The notebook will show how to create a new model as well as a new model version. The model version will have auto-scaling settings turned on, so that new nodes will be created and removed as the load changes.\n",
        "\n",
        "We will use a [Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder-large/5) model from TensorFlow Hub. This model will create word embeddings from a model that has been trained on a variety of data sources.\n",
        "\n",
        "The notebook itself is adapted from the Universal Sentence Encoder [sample notebook](https://colab.sandbox.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb).\n",
        "\n",
        "The main changes to the sample notebook are:\n",
        "* Creation of AI Platform Prediction model and model version\n",
        "* Update to `embed()` function to use AI Platform Prediction for inference, rather than the local model\n",
        "* Streamlining of some non-essential content"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e91bf1aa1b96"
      },
      "source": [
        "## Constants"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "d45211ab6470"
      },
      "outputs": [],
      "source": [
        "# Change these parameters!\n",
        "\n",
        "PROJECT = 'YOUR-PROJECT-ID'  # Update with your project\n",
        "BUCKET = 'gs://YOUR-BUCKET-NAME'  # Update with your bucket\n",
        "REGION = 'us-central1'  # Update with your region"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "b35d0e9c978c"
      },
      "outputs": [],
      "source": [
        "# These parameters don't need to be changed\n",
        "\n",
        "MODULE_URL = \"https://tfhub.dev/google/universal-sentence-encoder-large/5\"\n",
        "MODEL_NAME = 'universal_sentence_encoder'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e3f30142d3a2"
      },
      "source": [
        "## Imports"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "707d8192aa76"
      },
      "outputs": [],
      "source": [
        "from google.api_core.client_options import ClientOptions\n",
        "import googleapiclient.discovery\n",
        "\n",
        "import tensorflow_hub as hub\n",
        "\n",
        "import datetime\n",
        "import logging\n",
        "import numpy as np\n",
        "import seaborn as sns"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "551692f9f450"
      },
      "source": [
        "## Download TensorFlow Hub Model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "46a3f6cd587c"
      },
      "outputs": [],
      "source": [
        "# Reduce logging output\n",
        "logging.getLogger(\"tensorflow\").setLevel(logging.ERROR)\n",
        "\n",
        "# Download model and return path\n",
        "model = hub.resolve(MODULE_URL)\n",
        "\n",
        "print(f\"model file {model} saved\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "32124f591fed"
      },
      "source": [
        "## Deploy AI Platform Prediction model and model version"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "910a5cc13ba0"
      },
      "outputs": [],
      "source": [
        "# Create AI Platform Prediction model\n",
        "\n",
        "!gcloud ai-platform models create '{MODEL_NAME}' \\\n",
        "  --region='{REGION}'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "28c21b33f170"
      },
      "outputs": [],
      "source": [
        "# Create model version string with the current datetime\n",
        "\n",
        "now = datetime.datetime.now()\n",
        "MODEL_VERSION = 'v' + datetime.datetime.strftime(now, '%m%d%Y%H%M%S')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bf6f7118d7f1"
      },
      "source": [
        "By default, the service will use **60% utilization** as the threshold to determine whether to scale up or down nodes. This setting can be changed by setting metric targets for either CPU or GPUs. For this notebook, we will add these parameters to the `config.yaml` file, which will then be specified with the `--config` parameter on the `gcloud` CLI.\n",
        "\n",
        "Alternatively, you can use [gcloud beta ai-platform versions create](https://cloud.google.com/sdk/gcloud/reference/beta/ai-platform/versions/create#--metric-targets) to specify the parameters directly without `config.yaml`:\n",
        "```\n",
        "  --metric-targets cpu-usage=80 \\\n",
        "  --metric-targets gpu-duty-cycle=80 \\\n",
        "  --min-nodes 2 \\\n",
        "  --max-nodes 4\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4cd131ff0a11"
      },
      "outputs": [],
      "source": [
        "# Write scaling parameters to config.yaml\n",
        "\n",
        "CONFIG = '''autoScaling:\n",
        "  minNodes: 2\n",
        "  maxNodes: 4\n",
        "  metrics:\n",
        "    - name: CPU_USAGE\n",
        "      target: 80\n",
        "    - name: GPU_DUTY_CYCLE\n",
        "      target: 80'''\n",
        "\n",
        "!echo '{CONFIG}' > config.yaml"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0cce2831ecc8"
      },
      "outputs": [],
      "source": [
        "# Create a new model version. This may take several minutes.\n",
        "\n",
        "!gcloud ai-platform versions create {MODEL_VERSION} \\\n",
        "  --model={MODEL_NAME} \\\n",
        "  --region={REGION} \\\n",
        "  --origin={model} \\\n",
        "  --staging-bucket={BUCKET} \\\n",
        "  --runtime-version=2.2 \\\n",
        "  --framework='TENSORFLOW' \\\n",
        "  --python-version=3.7 \\\n",
        "  --machine-type=n1-standard-4 \\\n",
        "  --accelerator count=1,type=nvidia-tesla-t4 \\\n",
        "  --config=config.yaml"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cc46b845bc74"
      },
      "source": [
        "## Use service to make predictions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "28197d7059f5"
      },
      "outputs": [],
      "source": [
        "# Initialize client\n",
        "\n",
        "endpoint = f'https://{REGION}-ml.googleapis.com'  # Use regional endpoint\n",
        "client_options = ClientOptions(api_endpoint=endpoint)\n",
        "service = googleapiclient.discovery.build('ml', 'v1', client_options=client_options, cache_discovery=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4d7e59397cf5"
      },
      "outputs": [],
      "source": [
        "# Helper function to invoke the prediction service from\n",
        "# https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/ml_engine/online_prediction/predict.py\n",
        "\n",
        "def predict_json(project, model, instances, version=None):\n",
        "    \"\"\"Send json data to a deployed model for prediction.\n",
        "    Args:\n",
        "        project (str): project where the AI Platform Model is deployed.\n",
        "        model (str): model name.\n",
        "        instances ([Mapping[str: Any]]): Keys should be the names of Tensors\n",
        "            your deployed model expects as inputs. Values should be datatypes\n",
        "            convertible to Tensors, or (potentially nested) lists of datatypes\n",
        "            convertible to tensors.\n",
        "        version: str, version of the model to target.\n",
        "    Returns:\n",
        "        Mapping[str: any]: dictionary of prediction results defined by the\n",
        "            model.\n",
        "    \"\"\"\n",
        "\n",
        "    name = 'projects/{}/models/{}'.format(project, model)\n",
        "\n",
        "    if version is not None:\n",
        "        name += '/versions/{}'.format(version)\n",
        "\n",
        "    response = service.projects().predict(\n",
        "        name=name,\n",
        "        body={'instances': instances}\n",
        "    ).execute()\n",
        "\n",
        "    if 'error' in response:\n",
        "        raise RuntimeError(response['error'])\n",
        "\n",
        "    return response['predictions']"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cc41909596d1"
      },
      "outputs": [],
      "source": [
        "def embed(input):\n",
        "    return predict_json(PROJECT, MODEL_NAME, input)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "dabbe79179fd"
      },
      "outputs": [],
      "source": [
        "# Helper functions for plotting\n",
        "\n",
        "def plot_similarity(labels, features, rotation):\n",
        "    corr = np.inner(features, features)\n",
        "    sns.set(font_scale=1.2)\n",
        "    g = sns.heatmap(\n",
        "        corr,\n",
        "        xticklabels=labels,\n",
        "        yticklabels=labels,\n",
        "        vmin=0,\n",
        "        vmax=1,\n",
        "        cmap=\"YlOrRd\")\n",
        "    g.set_xticklabels(labels, rotation=rotation)\n",
        "    g.set_title(\"Semantic Textual Similarity\")\n",
        "\n",
        "\n",
        "def run_and_plot(messages_):\n",
        "    message_embeddings_ = embed(messages_)\n",
        "    plot_similarity(messages_, message_embeddings_, 90)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "541fbe4f4b38"
      },
      "outputs": [],
      "source": [
        "# Plot the textual similarity between various messages\n",
        "\n",
        "messages = [\n",
        "    # Smartphones\n",
        "    \"I like my phone\",\n",
        "    \"My phone is not good.\",\n",
        "    \"Your cellphone looks great.\",\n",
        "\n",
        "    # Weather\n",
        "    \"Will it snow tomorrow?\",\n",
        "    \"Recently a lot of hurricanes have hit the US\",\n",
        "    \"Global warming is real\",\n",
        "\n",
        "    # Food and health\n",
        "    \"An apple a day, keeps the doctors away\",\n",
        "    \"Eating strawberries is healthy\",\n",
        "    \"Is paleo better than keto?\",\n",
        "\n",
        "    # Asking about age\n",
        "    \"How old are you?\",\n",
        "    \"what is your age?\",\n",
        "]\n",
        "\n",
        "run_and_plot(messages)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "21174c0b301c"
      },
      "source": [
        "## Cleanup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0460a320f991"
      },
      "outputs": [],
      "source": [
        "# Delete model version resource\n",
        "!gcloud ai-platform versions delete {MODEL_VERSION} --model {MODEL_NAME} --region {REGION} --quiet \n",
        "\n",
        "# Delete model resource\n",
        "!gcloud ai-platform models delete {MODEL_NAME} --region {REGION} --quiet"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "ai_platform_prediction_auto-scaling.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
